Instrument recognition is a crucial aspect of music information retrieval, and in recent years, machine learning-based methods have become the primary approach to addressing this challenge. However, existing models often struggle to accurately identify multiple instruments within music tracks that vary in length and quality. One key issue is that the instruments of interest may not appear in every clip of the audio sample, and when they do, they are often unevenly distributed across different sections of the track. Additionally, in polyphonic music, multiple instruments are often played simultaneously, leading to signal overlap. Using the same overlapping audio signals as partial classification features for different instruments will reduce the distinguishability of features between instruments, thereby affecting the performance of instrument recognition. These complexities present significant challenges for current instrument recognition models. Therefore, this paper proposes a multi-instance multi-scale graph attention neural network (MMGAT) with label semantic embeddings for instrument recognition. MMGAT designs an instance correlation graph to model the presence and quantitative timbre similarity of instruments at different positions from the perspective of multi-instance learning. Then, to enhance the distinguishability of signals after the overlap of different instruments and improve classification accuracy, MMGAT learns semantic information from the labels of different instruments as embeddings and incorporates them into the overlapping audio signal features, thereby enhancing the differentiability of audio features for various instruments. MMGAT then designs an instance-based multi-instance multi-scale graph attention neural network to recognize different instruments based on the instance correlation graphs and label semantic embeddings. The effectiveness of MMGAT is validated through experiments and compared to commonly used instrument recognition models. The experimental results demonstrate that MMGAT outperforms existing approaches in instrument recognition tasks.
Loading....